AITopics | manifold capacity

Collaborating Authors

manifold capacity

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Untangling in Invariant Speech Recognition

Cory Stephenson, Jenelle Feather, Suchismita Padhy, Oguz Elibol, Hanlin Tang, Josh McDermott, SueYeon Chung

Neural Information Processing SystemsAug-20-2025, 06:40:40 GMT

Neural Information Processing Systems http://nips.cc/

dimension, manifold, representation, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

To improve accessibility of the method, we will open source the analysis code, and clarify our

Neural Information Processing SystemsAug-20-2025, 06:31:50 GMT

We are grateful to the reviewers for their insightful and constructive comments. This is consistent with the methods of Refs. The "CNN dataset" is adapted from that used in Ref. [14], which we supplement with words from Spoken Wikipedia Corpus (SWC) to diversify the word instances and provide more balanced speaker classes for the speaker trained model.

accessibility, analysis code, manifold capacity, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.52)

Add feedback

Generalized Category Discovery via Token Manifold Capacity Learning

Tang, Luyao, Huang, Kunze, Chen, Chaoqi, Chen, Cheng

arXiv.org Artificial IntelligenceMay-21-2025

Generalized category discovery (GCD) is essential for improving deep learning models' robustness in open-world scenarios by clustering unlabeled data containing both known and novel categories. Traditional GCD methods focus on minimizing intra-cluster variations, often sacrificing manifold capacity, which limits the richness of intra-class representations. In this paper, we propose a novel approach, Maximum Token Manifold Capacity (MTMC), that prioritizes maximizing the manifold capacity of class tokens to preserve the diversity and complexity of data. MTMC leverages the nuclear norm of singular values as a measure of manifold capacity, ensuring that the representation of samples remains informative and well-structured. This method enhances the discriminability of clusters, allowing the model to capture detailed semantic features and avoid the loss of critical information during clustering. Through theoretical analysis and extensive experiments on coarse- and fine-grained datasets, we demonstrate that MTMC outperforms existing GCD methods, improving both clustering accuracy and the estimation of category numbers. The integration of MTMC leads to more complete representations, better inter-class separability, and a reduction in dimensional collapse, establishing MTMC as a vital component for robust open-world learning. Code is in github.com/lytang63/MTMC.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Artificial Intelligence

2505.14044

Country:

North America (0.28)
Asia > China (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.66)

Add feedback

Feature Learning beyond the Lazy-Rich Dichotomy: Insights from Representational Geometry

Chou, Chi-Ning, Le, Hang, Wang, Yichen, Chung, SueYeon

arXiv.org Artificial IntelligenceMar-23-2025

The ability to integrate task-relevant information into neural representations is a fundamental aspect of both biological and artificial intelligence. To enable theoretical analysis, recent work has examined whether a network learns task-relevant features (rich learning) or resembles a random feature model (or a kernel machine, i.e., lazy learning). However, this simple lazy-versus-rich dichotomy overlooks the possibility of various subtypes of feature learning that emerge from different architectures, learning rules, and data properties. Furthermore, most existing approaches emphasize weight matrices or neural tangent kernels, limiting their applicability to neuroscience because they do not explicitly characterize representations. In this work, we introduce an analysis framework based on representational geometry to study feature learning. Instead of analyzing what are the learned features, we focus on characterizing how task-relevant representational manifolds evolve during the learning process. In both theory and experiment, we find that when a network learns features useful for solving a task, the task-relevant manifolds become increasingly untangled. Moreover, by tracking changes in the underlying manifold geometry, we uncover distinct learning stages throughout training, as well as different learning strategies associated with training hyperparameters, uncovering subtypes of feature learning beyond the lazy-versus-rich dichotomy. Applying our method to neuroscience and machine learning, we gain geometric insights into the structural inductive biases of neural circuits solving cognitive tasks and the mechanisms underlying out-of-distribution generalization in image classification. Our framework provides a novel geometric perspective for understanding and quantifying feature learning in both artificial and biological neural networks.

artificial intelligence, machine learning, manifold, (18 more...)

arXiv.org Artificial Intelligence

2503.18114

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
(5 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models

Kirsanov, Artem, Chou, Chi-Ning, Cho, Kyunghyun, Chung, SueYeon

arXiv.org Artificial IntelligenceFeb-11-2025

Decoder-only language models have the ability to dynamically switch between various computational tasks based on input prompts. Despite many successful applications of prompting, there is very limited understanding of the internal mechanism behind such flexibility. In this work, we investigate how different prompting methods affect the geometry of representations in these models. Employing a framework grounded in statistical physics, we reveal that various prompting techniques, while achieving similar performance, operate through distinct representational mechanisms for task adaptation. Our analysis highlights the critical role of input distribution samples and label semantics in few-shot in-context learning. We also demonstrate evidence of synergistic and interfering interactions between different tasks on the representational level. Our work contributes to the theoretical understanding of large language models and lays the groundwork for developing more effective, representation-aware prompting strategies.

demonstration, manifold, representation, (13 more...)

arXiv.org Artificial Intelligence

2502.08009

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Toward a Geometric Theory of Manifold Untangling

Li, Xin, Wang, Shuo

arXiv.org Artificial IntelligenceMar-7-2023

It has been hypothesized that the ventral stream processing for object recognition is based on a mechanism called cortically local subspace untangling. A mathematical abstraction of object recognition by the visual cortex is how to untangle the manifolds associated with different object category. Such a manifold untangling problem is closely related to the celebrated kernel trick in metric space. In this paper, we conjecture that there is a more general solution to manifold untangling in the topological space without artificially defining any distance metric. Geometrically, we can either $embed$ a manifold in a higher dimensional space to promote selectivity or $flatten$ a manifold to promote tolerance. General strategies of both global manifold embedding and local manifold flattening are presented and connected with existing work on the untangling of image, audio, and language data. We also discuss the implications of untangling the manifold into motor control and internal representations.

machine learning, manifold, natural language, (20 more...)

arXiv.org Artificial Intelligence

2303.04203

Country:

North America > United States > West Virginia (0.04)
North America > United States > Missouri > St. Louis County > St. Louis (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(2 more...)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

Add feedback

Understanding the Logit Distributions of Adversarially-Trained Deep Neural Networks

Seguin, Landan, Ndirango, Anthony, Mishra, Neeli, Chung, SueYeon, Lee, Tyler

arXiv.org Artificial IntelligenceAug-26-2021

Adversarial defenses train deep neural networks to be invariant to the input perturbations from adversarial attacks. Almost all defense strategies achieve this invariance through adversarial training i.e. training on inputs with adversarial perturbations. Although adversarial training is successful at mitigating adversarial attacks, the behavioral differences between adversarially-trained (AT) models and standard models are still poorly understood. Motivated by a recent study on learning robustness without input perturbations by distilling an AT model, we explore what is learned during adversarial training by analyzing the distribution of logits in AT models. We identify three logit characteristics essential to learning adversarial robustness. First, we provide a theoretical justification for the finding that adversarial training shrinks two important characteristics of the logit distribution: the max logit values and the "logit gaps" (difference between the logit max and next largest values) are on average lower for AT models. Second, we show that AT and standard models differ significantly on which samples are high or low confidence, then illustrate clear qualitative differences by visualizing samples with the largest confidence difference. Finally, we find learning information about incorrect classes to be essential to learning robustness by manipulating the non-max logit information during distillation and measuring the impact on the student's robustness. Our results indicate that learning some adversarial robustness without input perturbations requires a model to learn specific sample-wise confidences and incorrect class orderings that follow complex distributions.

logit, manifold, robustness, (17 more...)

arXiv.org Artificial Intelligence

2108.12001

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On the geometry of generalization and memorization in deep neural networks

Stephenson, Cory, Padhy, Suchismita, Ganesh, Abhinav, Hui, Yue, Tang, Hanlin, Chung, SueYeon

arXiv.org Machine LearningMay-30-2021

This part of the gradient behaves similarly for permuted and unpermuted examples. In Eq. 25 we see that the contribution to the label dependent part of the gradient from permuted examples vanishes for large datasets, while the contribution from unpermuted examples does not provided the cross correlation between input features and labels is nonzero. This suggests that with small weight initialization, the gradient descent dynamics initially ignores the labels of permuted examples. Figure A.1 shows a breakdown of how the two components of the gradient computed on both unpermuted and permuted examples evolve over the course of training for the different layers of the VGG16 model trained on CIFAR-100. We see that the label dependent part behaves qualitatively differently for the unpermuted examples than for the permuted examples, as the permuted examples give close to zero contribution early in training in agreement with Eq. 25. The label independent part of the gradient shows similar trends between unpermuted and permuted examples, though in the final epochs, the unpermuted examples have a slightly larger label independent gradient indicating slightly greater model confidence on these examples. As the label dependent and label independent parts of the gradient have differing signs, they compete with each other and cancel when the loss is minimized, but are not independently zero and in fact grow during training. The slightly larger label independent gradient for unpermuted examples is balanced by a corresponding slightly larger label dependent gradient at the end of training.

gradient, manifold, permuted example, (16 more...)

arXiv.org Machine Learning

2105.14602

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.43)

Add feedback

Filters

Collaborating Authors

manifold capacity

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

8383f931b0cefcc631f070480ef340e1-Supplemental.pdf

Untangling in Invariant Speech Recognition

To improve accessibility of the method, we will open source the analysis code, and clarify our

8383f931b0cefcc631f070480ef340e1-Supplemental.pdf

Generalized Category Discovery via Token Manifold Capacity Learning

Feature Learning beyond the Lazy-Rich Dichotomy: Insights from Representational Geometry

The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models

Toward a Geometric Theory of Manifold Untangling

Understanding the Logit Distributions of Adversarially-Trained Deep Neural Networks

On the geometry of generalization and memorization in deep neural networks